perf(native): scope incremental rebuild to truly-changed files (#1012)#1027
perf(native): scope incremental rebuild to truly-changed files (#1012)#1027carlos-alm merged 5 commits intomainfrom
Conversation
Native 1-file incremental was 4.3x slower than WASM (876ms vs 203ms) because `run_pipeline` re-parsed the full reverse-dep cone (47 files for a 1-file change), which cascaded into insert/structure/roles/analysis phases. Adopt the WASM save+reconnect strategy: save reverse-dep → changed-file edges before purge, reconnect to new node IDs after Stage 5. Reverse-dep files are no longer re-parsed — only their affected edges are reconstructed. Result: 876ms → 43ms (95% faster, 0.78x WASM). - insertMs: 185ms → 0.2ms - structureMs: 73ms → 3.1ms - rolesMs: 311ms → 18.3ms Edge counts identical between full and 1-file rebuild (37134), confirming no edge loss or duplication. The existing `1-file incremental ratio` gate in `benchmark-parity-gate.mjs` (limit 1.5) enforces the ceiling.
|
Claude finished @carlos-alm's task —— View job PR Review: Native Incremental Rebuild Performance OptimizationReviewing PR #1027 for performance improvements to native incremental rebuilds. Todo List
Root Cause AnalysisProblem Being Solved: Native 1-file incremental rebuilds were 4.3× slower than WASM (876ms vs 203ms) because Why This Approach: The PR adopts WASM's proven save+reconnect strategy: save reverse-dep → changed-file edges before purge, reconnect to new node IDs after Stage 5. This eliminates the need to re-parse reverse-dep files — only their affected edges are reconstructed. The approach is well-tested in the WASM pipeline and provides identical edge counts. Risk Assessment: LOW RISK - This is adopting a proven strategy from the WASM engine rather than inventing new logic. The PR claims identical edge counts (37134) between full and incremental rebuilds, indicating no data loss. Configuration/Test Integrity Analysis✅ PASSED - No configuration changes detected that lower standards:
Test Coverage: Strong existing regression protection for this exact scenario:
Code Quality AssessmentArchitecture: Clean implementation following established patterns
Security: No security concerns identified
Performance: Implementation optimized for the stated goal
Backlog Compliance Assessment
Final Assessment |
Greptile SummaryThis PR ports the WASM/JS incremental rebuild optimization to the native pipeline: instead of re-parsing every reverse-dep file in the changed-file cone, it saves the reverse-dep → changed-file edges before purge and reconnects them to new node IDs after Stage 5, cutting 1-file incremental time from 876 ms to 43 ms. It also fixes the role-classification gap for removal-only incremental builds ( Confidence Score: 5/5Safe to merge — no P0/P1 issues found; previous review threads are fully addressed. The logic is well-reasoned and mirrors the proven WASM path. Edge save→purge→reconnect ordering is correct, deduplication via INSERT OR IGNORE is in place, the dropped-edge diagnostic is now logged, and the removal-only classification gap is patched. No new correctness or security issues identified. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant BP as build_pipeline
participant CD as change_detection
participant DB as SQLite
Note over BP,DB: Incremental rebuild (new strategy)
BP->>CD: save_reverse_dep_edges(changed_paths)
CD->>DB: SELECT edges rdep→changed (pre-purge)
DB-->>CD: SavedReverseDepEdge[]
CD-->>BP: saved_edges
BP->>CD: find_reverse_dependencies(removed_set) [if removals]
CD->>DB: SELECT files that import removed files
DB-->>CD: removal_reverse_deps
BP->>CD: purge_changed_files(files_to_purge, &[])
CD->>DB: DELETE nodes/edges for changed+removed files
Note over BP,DB: Stage 4-5: parse & insert only changed files
BP->>DB: parse_files_parallel(changed only)
BP->>DB: do_insert_nodes (new node IDs)
Note over BP,DB: Stage 7: rebuild edges for changed files
BP->>DB: build_import_edges / build_call_edges
BP->>CD: reconnect_reverse_dep_edges(saved_edges)
CD->>DB: SELECT new node id by (name,kind,file,~line)
CD->>DB: INSERT OR IGNORE edge (old_source_id, new_target_id)
CD-->>BP: (reconnected, dropped)
Note over BP,DB: Stage 8: classify roles
BP->>DB: do_classify_incremental(changed_files + removal_reverse_deps)
Reviews (3): Last reviewed commit: "fix(native): reclassify reverse-deps of ..." | Re-trigger Greptile |
| // Reconnect saved reverse-dep edges to new node IDs (#1012). Mirrors | ||
| // `reconnectReverseDepEdges` in build-edges.ts — for each saved edge, | ||
| // look up the new target node and recreate the edge with the original |
There was a problem hiding this comment.
Dropped edges silently discarded
reconnect_reverse_dep_edges returns (reconnected, dropped) but the return value is ignored here. Dropped edges occur when a target node can't be found after re-insert (e.g. a renamed or deleted symbol), and they're silently swallowed with no log output. In the normal "symbol still exists" path this is fine, but in practice this makes it very hard to diagnose partial edge loss during an incremental rebuild.
| // Reconnect saved reverse-dep edges to new node IDs (#1012). Mirrors | |
| // `reconnectReverseDepEdges` in build-edges.ts — for each saved edge, | |
| // look up the new target node and recreate the edge with the original | |
| if !saved_reverse_dep_edges.is_empty() { | |
| let (reconnected, dropped) = change_detection::reconnect_reverse_dep_edges(conn, &saved_reverse_dep_edges); | |
| if dropped > 0 { | |
| eprintln!("[codegraph] reconnect_reverse_dep_edges: {reconnected} reconnected, {dropped} dropped (target nodes not found)"); | |
| } | |
| } |
There was a problem hiding this comment.
Fixed in c268648 — reconnect_reverse_dep_edges return value is now captured and an eprintln! warning is emitted when dropped > 0 so partial edge loss is diagnosable. The normal happy path (no drops) stays quiet.
| Ok(new_id) => { | ||
| if insert_stmt | ||
| .execute(rusqlite::params![ | ||
| s.source_id, | ||
| new_id, | ||
| &s.edge_kind, | ||
| s.confidence, | ||
| s.dynamic, | ||
| ]) | ||
| .is_ok() | ||
| { | ||
| reconnected += 1; | ||
| } |
There was a problem hiding this comment.
reconnected counter inflated by INSERT OR IGNORE no-ops
insert_stmt.execute(…).is_ok() returns true for both a genuine insert (rows_affected() == 1) and a silently ignored duplicate (rows_affected() == 0, because INSERT OR IGNORE suppresses the constraint error). The counter will overcount when two saved entries resolve to the same (source_id, new_target_id, kind). Since the return value is only used for diagnostics this is low-impact, but it makes the (reconnected, dropped) log misleading.
| Ok(new_id) => { | |
| if insert_stmt | |
| .execute(rusqlite::params![ | |
| s.source_id, | |
| new_id, | |
| &s.edge_kind, | |
| s.confidence, | |
| s.dynamic, | |
| ]) | |
| .is_ok() | |
| { | |
| reconnected += 1; | |
| } | |
| Ok(new_id) => { | |
| match insert_stmt.execute(rusqlite::params![ | |
| s.source_id, | |
| new_id, | |
| &s.edge_kind, | |
| s.confidence, | |
| s.dynamic, | |
| ]) { | |
| Ok(n) if n > 0 => reconnected += 1, | |
| Ok(_) => {} // INSERT OR IGNORE: duplicate skipped | |
| Err(_) => dropped += 1, | |
| } | |
| } |
There was a problem hiding this comment.
Fixed in 2132d99 — replaced is_ok() with a match on rows_affected. Only Ok(n) if n > 0 increments reconnected; Ok(0) (the INSERT OR IGNORE no-op for duplicates) is now silently skipped instead of inflating the diagnostic counter. Genuine errors still increment dropped.
Codegraph Impact Analysis3 functions changed → 0 callers affected across 0 files
|
INSERT OR IGNORE returns Ok(0) for duplicate-row no-ops, so the previous `is_ok()` check inflated `reconnected` whenever two saved entries resolved to the same (source_id, new_target_id, kind). Match on the rows-affected count and only increment when n > 0; leave the dropped counter for genuine errors. Diagnostic-only — no functional change to edge data. Impact: 1 functions changed, 0 affected
`reconnect_reverse_dep_edges` returns `(reconnected, dropped)` but the caller previously discarded the result. Dropped edges occur when a saved target node can't be re-located after re-insert (e.g. the symbol was renamed or deleted), and silently swallowing them makes partial edge loss during incremental rebuild hard to diagnose. Surface the count via eprintln when dropped > 0 — same observability pattern other native diagnostics use. Impact: 1 functions changed, 0 affected
| let changed_paths: Vec<String> = | ||
| parse_changes.iter().map(|c| c.rel_path.clone()).collect(); | ||
|
|
||
| if !opts.no_reverse_deps.unwrap_or(false) { | ||
| saved_reverse_dep_edges = | ||
| change_detection::save_reverse_dep_edges(conn, &changed_paths); | ||
| } |
There was a problem hiding this comment.
Removed files' reverse-dep roles not reclassified
changed_paths is built from parse_changes only — change_result.removed is excluded. save_reverse_dep_edges therefore never saves edges from files that import a removed file (call it B→A where A is deleted). Those edges are correctly purged when A's nodes are deleted. But because B never enters file_symbols, it also never enters changed_files, so do_classify_incremental never reclassifies it. B's fan-out silently decreases in the DB but its role record is stale.
In the old code changed_rel_paths included removed files, find_reverse_dependencies found B, B was re-parsed and landed in changed_files for reclassification. The new strategy intentionally skips re-parsing, but the classification gap for the removal case was not compensated. For a removal-only incremental build changed_files is empty, so neither do_classify_incremental nor do_classify_full is invoked, leaving every file that imported the deleted file with a permanently stale role until a future build touches them again.
A minimal fix: identify reverse-dep files of removed entries before purge (the existing find_reverse_dependencies with change_result.removed as seed would work), then union those file paths into the seed passed to do_classify_incremental. No re-parse is needed — reclassification reads fan-in/fan-out from the current (post-purge) edges table.
There was a problem hiding this comment.
Fixed in b6baec0 — applied your suggested approach: before purge, compute reverse-deps of change_result.removed via find_reverse_dependencies and union them into the seed passed to do_classify_incremental. No re-parse needed; the classifier reads fan-in/fan-out from the post-purge edges table.
This also covers the removal-only build case (where changed_files would otherwise be empty and classification skipped entirely). Verified the change doesn't regress: 181 Rust unit tests + 23 incremental integration tests + 15 roles tests all pass locally. The WASM side at detect-changes.ts:467 already includes ctx.removed in its findReverseDependencies seed, so this restores parity.
When a file is removed during incremental build, its nodes are purged along with edges pointing at them. Files that imported the removed file were never re-parsed (correct, by design) but their role records became stale: fan-out dropped silently because do_classify_incremental was only seeded with parsed files. For removal-only builds, classification was skipped entirely. Compute reverse-deps of removed entries before purge (find_reverse_deps with change_result.removed as seed, mirroring WASM/JS) and union them into the seed passed to do_classify_incremental. No re-parse needed — the classifier reads fan-in/fan-out from the post-purge edges table. Impact: 1 functions changed, 0 affected
Summary
run_pipelinere-parsed the full reverse-dep cone (47 files for a 1-file change), cascading into insert/structure/roles/analysis phases.Per-phase improvements
insertMsstructureMsrolesMsAcceptance criteria (#1012)
benchmark-parity-gate.mjsalready checks1-file incremental ratioat limit 1.5Verification
Test plan
incremental-benchmark.tsto confirm timing